home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Internet
/
Collection of Internet.iso
/
faq
/
comp
/
comp_spe
/
part2
< prev
next >
Wrap
Internet Message Format
|
1994-04-16
|
31KB
Path: bloom-beacon.mit.edu!senator-bedfellow.mit.edu!faqserv
From: andrewh@speech.su.oz.au (Andrew Hunt)
Newsgroups: comp.speech,comp.answers,news.answers
Subject: comp.speech Frequently Asked Questions - part 2/3
Supersedes: <comp-speech-faq/part2_764040899@rtfm.mit.edu>
Followup-To: comp.speech
Date: 16 Apr 1994 13:08:02 GMT
Organization: Speech Technology Group, The University of Sydney
Lines: 753
Approved: news-answers-request@MIT.Edu
Expires: 28 May 1994 13:05:48 GMT
Message-ID: <comp-speech-faq/part2_766501548@rtfm.mit.edu>
References: <comp-speech-faq/part1_766501548@rtfm.mit.edu>
Reply-To: andrewh@speech.su.oz.au (Andrew Hunt)
NNTP-Posting-Host: bloom-picayune.mit.edu
Summary: Useful information about Speech Technology
X-Last-Updated: 1994/04/06
Originator: faqserv@bloom-picayune.MIT.EDU
Xref: bloom-beacon.mit.edu comp.speech:2284 comp.answers:4933 news.answers:18147
Archive-name: comp-speech-faq/part2
Last-modified: 1994/04/06
SECTION 2 - Signal Processing for Speech
Q2.1: What sampling do I need for speech?
For recorded speech to be understood by humans you need an 8kHz
sampling rate or more and at least 8 bit sampling. This produces
poor quality speech - but in can be understood.
Improvements can be achieved by increasing the number of bits
in sampling to 12bits or 16bits, or by using a non-linear encoding
technique such as mu-law or A-law (see Q2.7). This improves
the "signal-to-noise" ratio.
Increasing the sampling rate above 8kHz, say to 10kHz, 16kHz or 20Khz,
improves the frequency response: the higher the sampling frequency
the better the high frequency content will be. A 16kHz sampling rate
is a reasonable target for high quality speech recording and playback.
When doing speech recognition you need to remember that the your
computer is not as good as your ear so it will have trouble with poor
qulaity sounds. The choice of an appropriate sampling setup depends
very much on the speech recognition task and the amount of computer
power available.
------------------------------------------------------------------------
Q2.2: How do I find the pitch of a speech signal?
This topic comes up regularly in the comp.dsp newsgroup. Question 2.5
of the FAQ posting for comp.dsp gives a comprehensive list of references
on the definition, perception and processing of pitch.
------------------------------------------------------------------------
Q2.3: How do I find the start and end points of a speech signal?
A large number of papers have been presented on this task. Try the
following papers:-
Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints
of Isolated Utterances", Bell System Technical Journal, Vol 54,
No. 2, pp 297-315, 1975.
Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans on
Communications, Vol 26, No 1, Jan 78, pp. 140-145.
Newman, W.C. "Detecting Speech with an Adapative Neural Network."
Electronic Design. 22 March 1990.
------------------------------------------------------------------------
Q2.4: Where can I find FFT software?
Try the following file - available by anonymous ftp :-
usc.edu:/pub/C-numanal/fft-stuff.tar.gz
It contains a series of optimised fft routines, including mixed-radix
algorithms. Note that the .gz suffix indicates GNU zip format.
------------------------------------------------------------------------
Q2.5: What signal processing techniques are used in speech technology?
This question is far to big to be answered in a FAQ posting. Fortunately
there are many good books which answer the question!
Some good introductory books include
Digital processing of speech signals; L. R. Rabiner, R. W. Schafer.
Englewood Cliffs; London: Prentice-Hall, 1978
Voice and Speech Processing; T. W. Parsons.
New York; McGraw Hill 1986
Computer Speech Processing; ed Frank Fallside, William A. Woods
Englewood Cliffs: Prentice-Hall, c1985
Digital speech processing : speech coding, synthesis, and recognition
edited by A. Nejat Ince; Kluwer Academic Publishers, Boston, c1992
Speech science and technology; edited by Shuzo Saito
pub. Ohmsha, Tokyo, c1992
Speech analysis; edited by Ronald W. Schafer, John D. Markel
New York, IEEE Press, c1979
Douglas O'Shaughnessy -- Speech Communication: Human and Machine
Addison Wesley series in Electrical Engineering: Digital Signal Processing,
1987.
------------------------------------------------------------------------
Q2.6: What speech sampling and signal processing hardware can I use?
In addition to the following information, have a look at the Audio File
format document prepared by Guido van Rossum (see details in Section 1.7).
Product: Sun standard audio port (SPARC 1 & 2)
Input: 1 channel, 8 bit mu-law encoded (telephone quality)
Output: 1 channel, 8 bit mu-law encoded (telephone quality)
Product: Ariel
Platform: Sun + others?
Input: 2 channels, 16bit linear, sample rate 8-96kHz (inc 32, 44.1, 48kHz).
Output: 2 channels, 16bit linear, sample rate 8-50kHz (inc 32, 44.1, 48kHz).
Contact: Ariel Corp.433 River Road,
Highland Park, NJ 08904.
Ph: 908-249-2900 Fax: 908-249-2123 DSP BBS: 908-249-2124
Product: IBM RS/6000 ACPA (Audio Capture and Playback Adapter)
Description: The card supports PCM, Mu-Law, A-Law and ADPCM at 44.1kHz
(& 22.05, 11.025, 8kHz) with 16-bits of resolution in stereo.
The card has a built-in DSP (don't know which one). The device
also supports various formats for the output data, like big-endian,
twos complement, etc. Good noise immunity.
The card is used for IBM's VoiceServer (they use the DSP for
speech recognition). Apparently, the IBM voiceserver has a
speaker-independent vocabulary of over 20,000 words and each
ACPA can support two independent sessions at once.
Cost: $US495
Contact: ?
Product: Sound Galaxy NX , Aztech Systems
Platform: PC - DOS,Windows 3.1
Cost: ??
Input: 8bit linear, 4-22 kHz.
Output: 8bit linear, 4-44.1 kHz
Misc: 11-voice FM Music Synthesizer YM3812; Built-in power amplifier;
DSP signal processing support - ST70019SB
Hardware ADPCM decompression (2:1,3:1,4:1)
Full "AdLib" and "Sound Blaster" compatbility.
Software includes a simple Text-to-Speech program "Monologue".
Product: Sound Galaxy NX PRO, Aztech Systems
Platform: PC - DOS,Windows 3.1
Cost: ??
Input: 2 * 8bit linear, 4-22.05 kHz(stereo), 4-44.1 KHz(mono).
Output: 2 * 8bit linear, 4-44.1 kHz(stereo/mono)
Misc: 20-voice FM Music Synthesizer; Built-in power amplifier;
Stereo Digital/Analog Mixer; Configuration in EEPROM.
Hardware ADPCM decompression (2:1,3:1,4:1).
Includes DSP signal processing support
Full "AdLib" and "Sound Blaster Pro II" compatybility.
Software includes a simple Text-to-Speech program "Monologue"
and Sampling laboratory for Windows 3.1: WinDAT.
Contact: USA (510)6238988
Product Name: ATI Stereo F/X Sound Board
Platform: PC XT or AT - DOS, Windows 3.0, 3.1
Cost: $120 Canadian
Description:
Input - 8 bit ADC, 44.1 kHz mono, 22.05 kHz Stereo.
Output - Dynamic range = 48 dB, 32 anti-aliasing filters
Adds Stereo effect to existing mono Adlib or Sound Blaster apps.
11-voice YAMAHA FM Music Synthesizer
Built-in 8 watt power amplifier, 4 watts per channel.
Volume ctrl on rear.
2 Joystick input, software setup (no switches), software included.
"AdLib" and "Sound Blaster" compatibility.
DMA support for high speed digital audio.
ADPCM decomp @ 4:1, 3:1, 2:1. Will play .WAV files.
Optional MIDI I/O port $79. (MIDI IN, OUT, THRU, and sequencer).
Contact: ATI Technologies Inc.
3761 Victoria Park Avenue
Scarborough, Ontario
CANADA, M1W 3S2
Ph: (416) 756-0711 Fax: (416) 756-0720
BBS: (416) 764-9404 (9600 baud N.8.1)
Other PC Sound Cards
============================================================================
sound stereo/mono compatible included voices
card & sample rate with ports
============================================================================
Adlib Gold stereo: 8-bit 44.1khz Adlib ? audio 20 (opl3)
1000 16-bit 44.1khz in/out, +2 digital
mono: 8-bit 44.1khz mic in, channels
16-bit 44.1khz joystick,
MIDI
Sound Blaster mono: 8-bit 22.1khz Adlib audio 11 synth.
FM synth with in/out,
2 operators joystick,
Sound Blaster stereo: 8-bit 22.05khz Adlib audio 22
Pro Basic mono: 8-bit 44.1khz Sound Blaster in/out,
joystick,
Sound Blaster stereo: 8-bit 22.05khz Adlib audio 11
Pro mono: 8-bit 44.1khz Sound Blaster in/out
joystick,
MIDI, SCSI
Sound Blaster stereo: 8-bit 4-44.1khz Sound Blaster audio 20
16 ASP stereo: 16-bit 4-44.1khz in/out,
joystick,
MIDI
Audio Port mono: 8-bit 22.05khz Adlib audio 11
Sound Blaster in/out,
joystick
Pro Audio stereo: 8-bit 44.1khz Adlib audio, 20
Spectrum + Pro Audio in/out,
Spectrum joystick
Pro Audio stereo: 16-bit 44.1khz Adlib audio 20
Spectrum 16 Pro Audio in/out,
Spectrum joystick,
Sound Blaster MIDI, SCSI
Thunder Board stereo: 8-bit 22khz Adlib audio 11
Sound Blaster in/out,
joystick
Gravis stereo: 8-bit 44.1khz Adlib, audio line 32 sampled
Ultrasound mono: 8-bit 44.1khz Sound Blaster in/out, 32 synth.
amplified
out,
(w/16-bit daughtercard) mic in, CD
stereo: 16-bit 44.1khz audio in,
mono: 16-bit 44.1khz daughterboard
ports (for
SCSI and
16-bit)
MultiSound stereo: 16-bit 44.1kHz Nothing audio 32 sampled
64x oversampling in/out,
joystick,
MIDI
=============================================================================
Can anyone provide information on Mac, NeXT and other hardware?
Product: xxx
Platform: PC, Mac, Sun, ...
Rough Cost (pref $US):
Input: e.g. 16bit linear, 8,10,16,32kHz.
Output: e.g. 16bit linear, 8,10,16,32kHz.
DSP: signal processing support
Other:
Contact:
------------------------------------------------------------------------
Q2.7: How do I convert to/from mu-law format?
Mu-law coding is a form of compression for audio signals including speech.
It is widely used in the telecommunications field because it improves the
signal-to-noise ratio without increasing the amount of data. Typically,
mu-law compressed speech is carried in 8-bit samples. It is a companding
technqiue. That means that carries more information about the smaller signals
than about larger signals. Mu-law coding is provided as standard for the
audio input and output of the SUN Sparc stations 1&2 (Sparc 10's are linear).
On SUN Sparc systems have a look in the directory /usr/demo/SOUND. Included
are table lookup macros for ulaw conversions. [Note however that not all
systems will have /usr/demo/SOUND installed as it is optional - see your
system admin if it is missing.]
OR, here is some sample conversion code in C.
# include <stdio.h>
unsigned char linear2ulaw(/* int */);
int ulaw2linear(/* unsigned char */);
/*
** This routine converts from linear to ulaw.
**
** Craig Reese: IDA/Supercomputing Research Center
** Joe Campbell: Department of Defense
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) "A New Digital Technique for Implementation of Any
** Continuous PCM Companding Law," Villeret, Michel,
** et al. 1973 IEEE Int. Conf. on Communications, Vol 1,
** 1973, pg. 11.12-11.17
** 3) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: Signed 16 bit linear sample
** Output: 8 bit ulaw sample
*/
#define ZEROTRAP /* turn on the trap as per the MIL-STD */
#undef ZEROTRAP
#define BIAS 0x84 /* define the add-in bias for 16 bit samples */
#define CLIP 32635
unsigned char linear2ulaw(sample) int sample; {
static int exp_lut[256] = {0,0,1,1,2,2,2,2,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
int sign, exponent, mantissa;
unsigned char ulawbyte;
/* Get the sample into sign-magnitude. */
sign = (sample >> 8) & 0x80; /* set aside the sign */
if(sign != 0) sample = -sample; /* get magnitude */
if(sample > CLIP) sample = CLIP; /* clip the magnitude */
/* Convert from 16 bit linear to ulaw. */
sample = sample + BIAS;
exponent = exp_lut[( sample >> 7 ) & 0xFF];
mantissa = (sample >> (exponent + 3)) & 0x0F;
ulawbyte = ~(sign | (exponent << 4) | mantissa);
#ifdef ZEROTRAP
if (ulawbyte == 0) ulawbyte = 0x02; /* optional CCITT trap */
#endif
return(ulawbyte);
}
/*
** This routine converts from ulaw to 16 bit linear.
**
** Craig Reese: IDA/Supercomputing Research Center
** 29 September 1989
**
** References:
** 1) CCITT Recommendation G.711 (very difficult to follow)
** 2) MIL-STD-188-113,"Interoperability and Performance Standards
** for Analog-to_Digital Conversion Techniques,"
** 17 February 1987
**
** Input: 8 bit ulaw sample
** Output: signed 16 bit linear sample
*/
int ulaw2linear(ulawbyte) unsigned char ulawbyte; {
static int exp_lut[8] = { 0, 132, 396, 924, 1980, 4092, 8316, 16764 };
int sign, exponent, mantissa, sample;
ulawbyte = ~ulawbyte;
sign = (ulawbyte & 0x80);
exponent = (ulawbyte >> 4) & 0x07;
mantissa = ulawbyte & 0x0F;
sample = exp_lut[exponent] + (mantissa << (exponent + 3));
if(sign != 0) sample = -sample;
return(sample);
}
=======================================================================
SECTION 3 - Speech Coding and Compression
Q3.1: Speech compression techniques.
Can anyone provide a 1-2 page summary on speech compression? Topics to
cover might include common technqiues, where speech compression might be
used and perhaps something on why speech is difficult to compress.
[The FAQ for comp.compression includes a few questions and answers
on the compression of speech.]
------------------------------------------------------------------------
Q3.2: What are some good references/books on coding/compression?
Douglas O'Shaughnessy -- Speech Communication: Human and Machine
Addison Wesley series in Electrical Engineering: Digital Signal
Processing, 1987.
Bishnu Atal in ed. Fallside, F. and W. Woods, ed. Computer Speech
Processing. London: Prentice/Hall International, 1985.
Makhoul, J. "Linear Prediction: A Tutorial Review." Proc. of the
IEEE 63 (1975): 561 - 580.
------------------------------------------------------------------------
Q3.3: What software is available?
Note: there are two types of speech compression technique referred to below.
Lossless technqiues preserve the speech through a compression-decompression
phase. Lossy techniques do not preserve the speech prefectly. As a general
rule, the more you compress speech, the more the quality degardes.
Package: File format conversion
Platform: SUN OS?
Description: Conversion utility able to encode and decode between the
the following formats: G.723, G.721, A-law, u-law and linear.
Availability: By anonymous ftp from
ftp.cwi.nl:/pub/audio/ccitt-adpcm.tar.Z
Package: shorten - a lossless compressor for speech signals
Platform: UNIX/DOS
Description: A lossless compressor for speech signals. It will compile and
run on UNIX workstations and will cope with a wide variety of
formats. Compression is typically 50% for 16bit clean speech
sampled at 16kHz.
Availability: Anonymous ftp - POrtable UNIX version is
svr-ftp.eng.cam.ac.uk:/comp.speech/sources/shorten-1.11.tar.Z
Unsupported DOS version is
svr-ftp.eng.cam.ac.uk:/comp.speech/sources/shn109.exe
Package: CELP 3.2a & LPC
Platform: Sun (the makefiles & source can be modified for other platforms)
Description: CELP is lossy compression technqiue.
The U.S. DoD's Federal-Standard-1016 based 4800 bps code excited
linear prediction voice coder version 3.2a (CELP 3.2a) Fortran and
C simulation source codes. Available for worldwide distribution
(on DOS diskettes, but configured to compile on Sun SPARC stations)
from NTIS and DTIC. Example input and processed speech files are
included. A Technical Information Bulletin (TIB), "Details to Assist
in Implementation of Federal Standard 1016 CELP," and the official
standard, "Federal Standard 1016, Telecommunications: Analog to
Digital Conversion of Radio Voice by 4,800 bit/second Code Excited
Linear Prediction (CELP)," are also available.
Availability 1: Through the National Technical Information Service:
NTIS
U.S. Department of Commerce
5285 Port Royal Road,
Springfield, VA 22161, USA
The "AD" ordering number for the CELP software is AD M000 118
(US$ 90.00) and for the TIB it's AD A256 629 (US$ 17.50).
The LPC-10 standard, described below, is FIPS Pub 137 (US$ 12.50).
There is a $3.00 shipping charge on all U.S. orders. The telephone
number for their automated system is 703-487-4650, or 703-487-4600
if you'd prefer to talk with a real person.
(U.S. DoD personnel and contractors can receive the package from the
Defense Technical Information Center: DTIC, Building 5, Cameron
Station, Alexandria, VA 22304-6145. Their telephone number is
703-274-7633.)
Availability 2: By anonymous ftp from:
super.org (192.31.192.1):/pub/celp_3.2a.tar.Z
OR
svr-ftp.eng.cam.ac.uk:comp.speech/sources/celp_3.2a.tar.Z
Misc: The following articles describe the Federal-Standard-1016 4.8-kbps
CELP coder (it's unnecessary to read more than one):
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
"The Federal Standard 1016 4800 bps CELP Voice Coder," Digital Signal
Processing, Academic Press, 1991, Vol. 1, No. 3, p. 145-155.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch,
"The DoD 4.8 kbps Standard (Proposed Federal Standard 1016),"
in Advances in Speech Coding, ed. Atal, Cuperman and Gersho,
Kluwer Academic Publishers, 1991, Chapter 12, p. 121-133.
Campbell, Joseph P. Jr., Thomas E. Tremain and Vanoy C. Welch, "The
Proposed Federal Standard 1016 4800 bps Voice Coder: CELP," Speech
Technology Magazine, April/May 1990, p. 58-64.
* The U.S. DoD's Federal-Standard-1015/NATO-STANAG-4198 based 2400
bps linear prediction coder (LPC-10) was republished as a Federal
Information Processing Standards Publication 137 (FIPS Pub 137).
It is described in:
Thomas E. Tremain, "The Government Standard Linear Predictive Coding
Algorithm: LPC-10," Speech Technology Magazine, April 1982, p. 40-49.
There is also a section about FS-1015 in the book:
Panos E. Papamichalis, Practical Approaches to Speech Coding,
Prentice-Hall, 1987.
* The voicing classifier used in the enhanced LPC-10 (LPC-10e) is
described in: Campbell, Joseph P., Jr. and T. E. Tremain, "Voiced/
Unvoiced Classification of Speech with Applications to the U.S.
Government LPC-10E Algorithm," Proceedings of the IEEE International
Conf. on Acoustics, Speech, and Signal Processing, 1986, p. 473-6.
* Copies of the official standard, "Federal Standard 1016, Tele-
communications: Analog to Digital Conversion of Radio Voice by 4,800
bit/second Code Excited Linear Prediction (CELP)" are available for
US$ 5.00 each from:
GSA Federal Supply Service Bureau
Specification Section, Suite 8100
470 E. L'Enfant Place, S.W.
Washington, DC 20407
(202)755-0325
* Realtime DSP code for FS-1015 and FS-1016 is sold by:
John DellaMorte, DSP Software Engineering
165 Middlesex Tpk, Suite 206
Bedford, MA 01730, USA
Ph: 1-617-275-3733 Fax: 1-617-275-4323
dspse.bedford@channel1.com
* DSP Software Engineering's FS-1016 code can run on a DSP Research's
Tiger 30 (a PC board with a TMS320C3x and analog interface suited
to development work).
DSP Research
1095 E. Duane Ave.
Sunnyvale, CA 94086, USA
Ph: (408)773-1042 Fax: (408)736-3451 (fax)
Package: 32 kbps ADPCM
Platform: SGI and Sun Sparcs
Description: 32 kbps ADPCM C-source code (G.721 compatibility is uncertain)
Contact: Jack Jansen
Availablity: Anoymous ftp to ftp.cwi.nl: pub/adpcm.shar
Package: GSM 06.10 Compression
Platform: Runs faster than real time on most Sun SPARCstations
Description: GSM 06.10 is lossy compression technqiue.
European GSM 06.10 provisional standard for full-rate speech
transcoding, prI-ETS 300 036, which uses RPE/LTP (residual
pulse excitation/long term prediction) coding at 13 kbit/s.
Contact: Carsten Bormann <cabo@cs.tu-berlin.de>
Availability: An implementation can be ftp'ed from:
tub.cs.tu-berlin.de: /pub/tubmik/gsm-1.0.tar.Z
+/pub/tubmik/gsm-1.0-patch1
or as a faster but not always up-to-date alternative:
liasun3.epfl.ch: /pub/audio/gsm-1.0pl1.tar.Z
Package: G.721/722/723 Compression
Description: ?
Availability: By email to teledoc@itu.arcom.ch, with
GET ITU-3022
as the *only* line in the body of the message.
This is also available by anonymous ftp from:
svr-ftp.eng.cam.ac.uk:comp.speech/sources/G711_G722_G723.tar.Z
Package: U.S.F.S. 1016 CELP vocoder for DSP56001
Platform: DSP56001
Description: Real-time U.S.F.S. 1016 CELP vocoder that runs on a single
27MHz Motorola DSP56001. Free demo software available from PC-56
and PC-56D. Source and object code available for a one-time
license fee.
Contact: Cole Erskine
Analogical Systems
2916 Ramona St.
Palo Alto, CA 94306, USA
Tel:(415) 323-3232 FAX:(415) 323-4222
Internet: cole@analogical.com
Product: 8 Kbit/s CELP on the TMS320C5x family of DSP chips.
Description: For low bandwidth transmission of voice, compact voice storage
for archival purposes, low-cost digital answering machines and
efficient storage for voice mail. Features :-
- near toll quality at 8 Kb/s.
- Variable rate option with 1 Kb/s silence encoding
- Implemented on a fixed-point processor for lower system cost.
- Attractive licensing scheme.
- Future availability of 4 Kb/s.
- Custom rates possible.
Capacity :-
- Two half-duplex or one full duplex channels on the 20 MIPS 'C5x
(at 95% and 55% CPU utilization respectively).
- Two full duplex channels on the 28.6 MIPS 'C5x
(at 77% CPU utilization).
- Requires 9 K-words program memory and 3 K-words data memory.
- Decoding in real-time on a 486 class CPU.
Contact: CVI Inc.
443 Vienna Cres. North Vancouver, BC, Canada V7N 3B3
Tel: (604) 987 1719 Fax: (604) 986 8139
Email: cvi@extropia.wimsey.com
=======================================================================
SECTION 4 - Natural Language Processing
There is now a newsgroup specifically for Natural Language Processing.
It is called comp.ai.nat-lang.
There is also a lot of useful information on Natural Language Processing
in the FAQ for comp.ai. That FAQ lists available software and useful
references. It includes a substantial list of software, documentation
and other info available by ftp.
------------------------------------------------------------------------
Q4.1: What are some good references/books on NLP?
Take a look at the FAQ for the "comp.ai" newsgroup as it also includes some
useful references.
James Allen: Natural Language Understanding. (Benjamin/Cummings Series in
Computer Science) Menlo Park: Benjamin/Cummings Publishing Company, 1987.
This book consists of four parts: syntactic processing, semantic
interpretation, context and world knowledge, and response generation.
G. Gazdar and C. Mellish, Natural Language Processing in {Prolog/Lisp/Pop11},
Addison Wesley, 1989
Emphasis on parsing, especially unification-based parsing, lots of
details on the lexicon, feature propagation, etc. Fair coverage of
semantic interpretation, inference in natural language processing,
and pragmatics; much less extensive than in Allen's book, but more
formal. There are three versions, one for each programming language
listed above, with complete code.
Shapiro, Stuart C.: Encyclopedia of Artificial Intelligence Vol.1 and 2.
New York: John Wiley & Sons, 1990.
There are articles on the different areas of natural language
processing which also give additional references.
Paris, Ce'cile L.; Swartout, William R.; Mann, William C.: Natural Language
Generation in Artificial Intelligence and Computational Linguistics. Boston:
Kluwer Academic Publishers, 1991.
The book describes the most current research developments in natural
language generation and all aspects of the generation process are
discussed. The book is comprised of three sections: one on text
planning, one on lexical choice, and one on grammar.
Readings in Natural Language Processing, ed by B. Grosz, K. Sparck Jones
and B. Webber, Morgan Kaufmann, 1986
A collection of classic papers on Natural Language Processing.
Fairly complete at the time the book came out (1986) but now
seriously out of date. Still useful for ATN's, etc.
Klaus K. Obermeier, Natural Language Processing Technologies
in Artificial Intelligence: The Science and Industry Perspective,
Ellis Horwood Ltd, John Wiley & Sons, Chichester, England, 1989.
The major journals of the field are "Computational Linguistics" and
"Cognitive Science" for the artificial intelligence aspects, "Cognition"
for the psychological aspects, "Language", "Linguistics and Philosophy" and
"Linguistic Inquiry" for the linguistic aspects. "Artificial Intelligence"
occasionally has papers on natural language processing.
The major conferences are ACL (held every year) and COLING (held every two
years). Most AI conferences have a NLP track; AAAI, ECAI, IJCAI and the
Cognitive Science Society conferences usually are the most interesting for
NLP. CUNY is an important psycholinguistic conference. There are lots of
linguistic conferences: the most important seem to be NELS, the conference
of the Chicago Linguistic Society (CLS), WCCFL, LSA, the Amsterdam Colloquium,
and SALT.
------------------------------------------------------------------------
Q4.2: What NLP software is available?
The FAQ for the "comp.ai" newsgroup lists a variety of language processing
software that is available. That FAQ is posted monthly.
Natural Language Software Registry (NLSR)
=========================================
The Natural Language Software Registry is available from the German Research
Institute for Artificial Intelligence (DFKI) in Saarbrucken. Its purpose
is to facilitate the exchange and evaluation of natural language processing
software within the research community. To this end, the NLSR is
cataloging natural language software projects, both commercial and non-
commercial. The new updated and enlarged version contains more than 100
descriptions of natural processing software. Registry listings include:
+ speech signal processors, such as the Computerized Speech Lab
(Kay Elemetrics)
+ morphological analyzers, such as PC-KIMMO
(Summer Institute for Linguistics)
+ parsers, such as Alveytools (University of Edinburgh)
+ semantic and pragmatic analyzer, such as NLL
(University of the Saarland, Germany)
+ generation programs, such as FUF
(Ben Gurion University of the Negev)
+ knowledge representation systems, such as Rhet
(University of Rochester)
+ multicomponent systems, such as ELU (ISSCO), PENMAN (ISI),
Pundit (UNISYS), SNePS (SUNY Buffalo),
+ NLP-Tools, such as GULP (University of Georgia) or Linguist
(Kansai Research Laboratory)
+ applications programs (misc.)
If you have developed a piece of software for natural language
processing that other researchers might find useful, you can include
it by returning the questionnaire available from the sources below.
ftp: Germany: ftp.dfki.uni-sb.de (134.96.188.252)
(directory: pub/registry, password:anonymous)
e-mail: registry@dfki.uni-sb.de
post: Natural Language Software Registry
Deutsches Forschungsinstitut fuer Kuenstliche Intelligenz (DFKI)
Stuhlsatzenhausweg 3
D-66123 Saarbruecken
Germany
Other ftp sites are
crlftp.nmsu.edu (128.123.1.33)
The directory is pub/non-lexical/NL_Software_Registy
dri.cornell.edu (128.84.180.39)
The directory is /pub/Natural_Language_Software_Registry
or /pub/NLSR
Andrew Hunt
Speech Technology Research Group Ph: 61-2-692 4509
Dept. of Electrical Engineering Fax: 61-2-692 3847
University of Sydney, NSW, 2006, Australia email: andrewh@speech.su.oz.au